Machine-Learning based Semantic Similarity Measures to Assist Discovery and Reuse of Data Exchange XML Schema

نویسندگان

  • BUHWAN JEONG
  • Hyunbo Cho
  • Jaewook Lee
  • Boonserm Kulvatunyou
چکیده

Interoperability will remain as an unsolvable problem forever at least until when you read this dissertation. Actually, we have spent unbelievable amounts of money due to lack of interoperability among any systems. According to the NIST reports, the estimated interoperability cost nearly reaches to a score billion US dollars just in two industries (i.e., the capital industry and part of the automotive industry) in US only. Can you imagine the total amounts over the world just a single year? To this reason, we have been discussed and agreed upon standard in data-centric integration societies. That was worked for a moment; however, everybody has wanted more than standards. This is an unbreakable chain of interoperability. Even though the standard-based integration approaches have definitely been failed, those experiences also give us much more lessons and opportunities. We recognized that the standard specifications had been evolved over the history. The failure of such approaches is surely due to the failure of consistent tracking of the evolution. In other words, the time is now, not very late to define a formal scheme and assistant tools to manage and support the schema (i.e., standard specifications) evolution. In this sense, the dissertation first aims at defining such a scheme, namely Model Development Life Cycle (MDLC) management, within which we formally identify and clarify the supporting activities and necessary information models. More specifically, the dissertation concentrates on the virtually initial activity of the MDLC, i.e., Schema Discovery activity. In fact, the core functions in schema discovery are reused in the other activities in variant forms. To the end, we designed a schema discovery engine, namely a Semantics Aware Lookup Assistant (SALA). The SALA interprets a set of given data exchange requirements and finds the most appropriate (integration) schema(s) meeting the requirements from a repository. The schema discovery is a new research area; but similar researches have been done under the title of schema matching and integration. We reviewed a bunch of existing works and approaches in the schema matching, and concluded that the main stream is to measure the semantic similarities among schemas, among which the measures indicate the degree of semantic correspondences. The principle in computing the semantic similarities is a convergence of two technology axes separately developed – the schema matching and the machine learning. With the philosophy, the research applies a few favorable machine-learning techniques (e.g., ANN, PCA, PLS) to the schema matching in …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Enhance Reuse of Standard e-Business XML Schema Documents

Ideally, e-Business application interfaces would be built from highly reusable specifications of business document standards. Since many of these specifications are poorly understood, users often create new ones or customize existing ones every time a new integration problem arises. Consequently, even though there is a potential for reuse, the lack of a component discovery tool means that the c...

متن کامل

A semantic similarity analysis for data mappings between heterogeneous XML schemas

One of the most critical steps to integrating heterogeneous e-Business applications using different XML schemas is schema mapping, which is known to be costly and error-prone. Past research on schema mapping has not made full use of semantic information imbedded in the hierarchical structure of the XML schema. In this chapter, we investigate the existing schema mapping approaches and propose an...

متن کامل

Induction on the Semantic Web

The Semantic Web is increasingly populated with instance data, nowadays often in the form of Linked Data. Consequently, machine learning and other instance driven approaches are of increasing relevance. In this special issue we have collected various inductive approaches and approaches from relational learning for solving a number of tasks. In particular, inductive methods are applied to learn ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005